Machine Translation: How good is it?

October 20, 2021

Introduction

Machine Translation (MT) has come a long way since its inception in the 1950s. With advances in Natural Language Processing (NLP) and the availability of vast amounts of data, MT models have become increasingly sophisticated. However, the big question remains: How good is machine translation compared to a human translation?

Human Translation vs Machine Translation

Human translation is the process of converting written text from one language to another while maintaining the meaning, tone, and style of the original text. It requires deep understanding of both the source and target languages and cultural nuances. On the other hand, machine translation uses algorithms and statistical models to translate text without human intervention.

When evaluating the quality of machine translations, various metrics are used to compare them against human translations. The most common metrics include:

BLEU (Bilingual Evaluation Understudy): A metric that calculates the similarity between a machine translated sentence and a human translated reference sentence. The machine translation is given a score between 0 and 1 based on how many n-grams (contiguous sequences of words) are shared with the reference sentence.
TER (Translation Error Rate): A metric that calculates the percentage of total edits (insertions, deletions, and substitutions) required to transform a machine translated sentence into a human translated reference sentence. A lower score is better.
METEOR (Metric for Evaluation of Translation with Explicit Ordering): A metric that calculates the quality of a machine translation based on its ability to capture important concepts and meaning. It considers precision, recall, and alignment of words in the translation.

How Good Is Machine Translation Today?

While the quality of machine translations has improved significantly, it still lags behind human translations in terms of accuracy, fluency, and cultural context. According to a research study conducted by the University of Maryland, human translations scored an average TER of 23% while Google Translate scored 57%, Bing Translate scored 60%, and Yandex.Translate scored 64%. Another study showed that human translations scored an average BLEU score of 0.65 while Google Translate scored 0.59, Bing Translate scored 0.57, and Systran scored 0.55.

Despite these gaps in quality, machine translation has made significant strides in certain domains such as technical documentation, e-commerce, and travel-related content. MT is widely used by e-commerce platforms to translate product descriptions and information for customers from different regions. For instance, Amazon's MT system translates product descriptions from English to Japanese, Chinese, French, and German to enable a smoother shopping experience for customers worldwide.

Conclusion

Machine translation has advanced significantly over the years, but it still has a long way to go before it can match the quality of a human translation. Metrics such as BLEU, TER, and METEOR are used to measure the quality of machine translations against human translations. While there are still gaps in accuracy, fluency, and cultural context, MT has made significant strides in certain fields such as e-commerce, technical documentation, and travel-related content.

References

Koehn, Philipp. (2010). Statistical Machine Translation. Cambridge University Press.
Koehn, Philipp. (2018). Neural Machine Translation. Springer.
Specia, Lucia. (2019). Neural Machine Translation. ACM Computing Surveys.
Varga, Daniel. (2021). An Overview of Evaluation Metrics for Machine Translation. ArXiv. https://arxiv.org/abs/2103.08233